The HMR (Human-Mouse-Rat) database is a combination of BioGrid, CCSB, HPRD, Intact and MDC databases developed at the University of Edinburgh. If you are on the Informatics local network the web service to query this database can be found here. The web service requires a list of proteins, which it will then return interactions between.

Loading Entrez IDs

To get coverage over as many proteins as possible we would like to simply use all human Entrez IDs which we know map to Ensembl IDs. The first step is to load this dictionary:


In [6]:
import csv, pickle

In [3]:
cd ../../geneconversion/


/home/gavin/Documents/MRes/geneconversion

In [7]:
f = open("human.gene2ensemble.pickle")
gene2ensembl = pickle.load(f)
f.close()

Bait and Prey proteins

As we are planning to use this feature on the bait and prey proteins in the end we should make sure that these proteins are in this list of proteins. Loading those Gene IDs and putting all IDs into a set:


In [20]:
cd ../forGAVIN/pulldown_data/BAITS/


/home/gavin/Documents/MRes/forGAVIN/pulldown_data/BAITS

In [21]:
f = open("baits_entrez_ids.csv")
baits = list(flatten(csv.reader(f)))
f.close()

In [14]:
cd ../PREYS/


/home/gavin/Documents/MRes/forGAVIN/pulldown_data/PREYS

In [18]:
f = open("prey_entrez_ids.csv")
preys = list(flatten(csv.reader(f)))
f.close()

In [22]:
proteinIDs = set(gene2ensembl.keys()+baits+preys)

In [23]:
cd ../../../geneconversion/


/home/gavin/Documents/MRes/geneconversion

Writing protein list

Then we can take the keys from this dictionary and write them to a file, which we can then paste into the web service form. The list was submitted with the global option, to return the maximum number of interactions.


In [27]:
f=open("human.entrez.HMR.flat.txt","w")
csv.writer(f,delimiter="\n").writerow(list(proteinIDs))
f.close()

Copying into web service

Using xclip we can copy the file into the web service, which returns the file webformoutput which has been saved to the HMR directory. Looking at this file, it is simply a list of Entrez ID pairs:


In [30]:
cd ../HMR/


/home/gavin/Documents/MRes/HMR

In [31]:
!head webformoutput.csv


1,10321
2,140545
2,126003
2,124912
2,84106
2,64223
2,55729
2,51295
2,51013
2,26085

Creating a useable feature

The fastest way to create a useable feature from this will be to do the same as was done in the STRING notebook and pickle an object which will return a 1 if the pair is present and a 0 otherwise. This is slightly sub-optimal in that any Entrez IDs which we have not supplied to the web form won't be represented, but the list we provided to the web form was fairly comprehensive so almost all human IDs should have been mined.


In [32]:
import sys

In [33]:
sys.path.append("../opencast-bio/")

In [34]:
import ocbio.ppipred

Loading the interactions as a dictionary


In [35]:
featuredict = {}
f = open("webformoutput.csv")
for line in csv.reader(f):
    featuredict[frozenset(line)] = ['1']
f.close()

Instantiating features object


In [36]:
features = ocbio.ppipred.features(featuredict,1)

Testing


In [37]:
realkey = featuredict.keys()[0]
fakekey = frozenset(["1275","4124"])

In [38]:
features[realkey]


Out[38]:
['1']

In [39]:
features[fakekey]


Out[39]:
['0']

Pickling

The object is then pickled so it can be easily added to the data source table using the "generator" option.


In [40]:
f = open("human.HMR.features.pickle","wb")
pickle.dump(features,f)
f.close()